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Abstract — This paper aims to go beyond resilience into the 
study of security and locality for distributed storage systems. 
Security and locality are both important as features of an efficient 
storage system, and this paper aims to understand the tradeoffs 
between resilience, security and locality in these systems. In 
particular, this paper first investigates security in the presence 
of colluding eavesdroppers, where eavesdroppers are assumed to 
work together in decoding stored information. Second, the paper 
focuses on coding schemes that enable optimal local repairs. It 
further brings these two concepts together, to develop locally- 
repairable coding schemes for DSS that are secure against 
eavesdroppers. 

The main results of this paper include: a. An improved 
bound on the secrecy capacity for minimum storage regenerating 
codes, b. secure coding schemes that achieve the bound for 
some special cases, c. new minimum distance bound for locally 
repairable codes, d. code construction for locally repairable 
codes that achieves the minimum distance bound, and e. repair- 
bandwidth-efficient locally repairable codes with and without 
security constraints. 

Index Terms — Coding for distributed storage systems, locally 
repairable codes, repair bandwidth efficient codes, security. 



I. Introduction 

A. Background 

Distributed storage systems (DSS) are of increasingly im- 
portance, given the vast amounts of data being generated 
and accessed worldwide. OceanStore 0J, Google File System 
(GFS) |2| and TotalRecall (3) are a few examples of existing 
DSS. An essential component of DSS is resilience to node 
failures, which is why every DSS today incorporates a mech- 
anism to protect against failures, thus preventing permanent 
loss of data stored using the system. Typically, this resilience 
is afforded by replication, and in recent years, using coding 
approaches. 

Node failures are one of the many design challenges faced 
by DSS. There are two other challenges, arguably of equal 
importance: security and locality. Due to the decentralized 
nature of such systems, it is important that they be secured 
against a variety of possible attacks. Our focus in this paper is 
on passive eavesdroppers located at multiple nodes in the DSS 
that can collude in attempting to gain an understanding of the 
stored data. In addition to being decentralized, DSS systems 
are often widely geographically distributed, and therefore 
locality in storage proves very useful. In this paper, we develop 
a deeper understanding of locality in storage, and subsequently 
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combine locality and security to develop codes for secure 
locally-repairable DSS. 

The security of communication or storage systems can be 
analyzed with their resilience to active or passive attacks H, 
121 . Active attacks in such systems include settings were the 
adversary modifies existing packets or injects new ones into 
the system, whereas the passive attack models include eaves- 
droppers observing the messages being stored/transmitted. For 
DSS, cryptographic approaches are often ineffective, as key 
distribution and management between all nodes in the system 
is extremely challenging to accomplish. A coding/information 
theoretic approach to security is desired, which typically offers 
stronger security guarantees than cryptographic schemes (6), 
171 and, in this context, is logistically easier to realize than 
mechanisms that require key management. A secrecy-enabling 
coding scheme is designed based on a worst-case estimate 
of the information leaked to eavesdroppers, and can naturally 
complement other existing coding schemes being utilized in 
distributed storage systems. In its simplest form, security 
against an eavesdropper can be achieved using a one-time pad 
scheme 0. For example, consider that the contents of the 
two nodes are given by X\ = R, and X^ = R ffi d, where 
R is a uniformly random bit, and d is the data bit. Then, 
by contacting both nodes, one can clearly obtain the data by 
computing Xi®X2- However, one can not get any information 
about the data bit by observing any one of the two nodes 
as I(Xi;D) = for i = 1,2, i.e., the mutual information 
between the data and the content of one of the nodes is zero. 
Thus, information theoretic approach has clearly a significant 
value in securing DSS. 

Local-repairability of DSS is an additional property, which 
can be one of the primary design criteria for the system. The 
corresponding performance metric associated with a coding 
scheme is its locality r, which is defined as the number of 
nodes that must participate in a repair process when a partic- 
ular node fails. Locality requires fewer nodes to be involved 
in the node repair process, which makes the entire process 
easier from a logistical perspective. In addition, locality is of 
significant interest when a cost is associated with contacting 
each node in the system. Locality, in its simplest form, can 
be accomplished by splitting the data into groups, and each 
group can be coded and stored separately. However, this naive 
approach requires the connection to all the groups in order to 
retrieve the whole data, and may not be the most efficient in 
terms of performance. Therefore, there is a growing interest in 
more sophisticated mechanisms for achieving locality in DSS. 
Regardless, systems designed with locality in mind can also 
present benefits in terms of security. In other words, locality 
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and security against eavesdropper attack go hand in hand, and 
a joint design of both features can prove to be particularly 
useful, as we illustrate in this paper. 

In DSS, encoding data before storing it provides the same 
level of resilience against node failures as that of the con- 
ventional approach of uncoded replication, but with much less 
storage space. The advantages that can be leveraged in terms of 
storage space may result in a degradation of other performance 
metrics. Being one of such metrics, repair bandwidth refers 
to the amount of data that needs to be transferred in the event 
of single node failure in order to regenerate the data on the 
failed node. This metric is highly relevant as a large fraction 
of network bandwidth in DSS can be occupied by the data 
being transferred during repair process. Thus, it is desirable to 
have coding schemes with small repair bandwidth. Most of the 
maximal distance separable (MDS) codes designed for DSS, 
which encode k data blocks to n encoded blocks, store each 
encoded block on different nodes. This naive approach entails 
a high repair bandwidth as the entire original file needs to be 
reconstructed in order to regenerate the encoded data stored 
at a particular storage node. In (9), Dimakis et al. explore this 
problem and establish a trade off between the per node storage 
and repair bandwidth for a code that has the MDS ("any k out 
of n") property, i.e., entire data can be reconstructed by a data 
collector by contacting to any k storage nodes. This new class 
of codes are referred to as regenerating codes, and allows 
for trading off repair bandwidth for storage J9). Utilizing a 
network coding approach, the notion of functional repair is 
considered in J9), where the original failed node may not be 
replicated exactly, but can be repaired as an encoded data that 
is functionally equivalent. However, it is desirable to perform 
exact repair in DSS, where the data regenerated after the 
repair process is an exact replica of what was stored on the 
failed node. This is essential due to the ease of maintanence 
and other practical purposes, e.g., maintaining a code in its 
systematic form. Exact repair is also advantageous compared 
to the functional repair in the presence of eavesdroppers, as 
the latter scheme requires updating the coding rules which 
may leak additional information to eavesdroppers ifTol . Noting 
the resilience of exact repair to eavesdropping attacks and the 
necessity of it for practical purposes, it is of significant interest 
to design regenerating codes that not only enjoy an optimal 
trade off in repair bandwidth vs. storage, but also satisfy exact 
repair in addition to security and/or locality constraints. 

B. Contributions and Organization 

In this paper, we consider secure and locally repairable 
regenerating codes for DSS. As a security constraint, we 
adopt the passive and colluding eavesdropper model presented 
in ifTTl . where, during the entire life span of the DSS, the 
eavesdropper can get access to data stored on an i\ number of 
nodes, and, in addition, it observes both the stored content and 
the data downloaded (for repair) on an additional £2 number of 
nodes. This attack model generalizes the eavesdropper model 
proposed in |[Tol . which considers the case of £2 = 0. As the 
amount of information downloaded when a node repair is in 
progress is equal to the information stored on the repaired node 



for minimum bandwidth regenerating codes, the two notions 
are different only at the minimum storage regenerating point. 

With this general eavesdropper model, we extend the exist- 
ing results on the design of secure minimum storage regenerat- 
ing codes for DSS. First, we derive an upper bound on secrecy 
capacity, the amount of data that can be stored on the system 
without leaking information to an eavesdropper, for a DSS 
employing bandwidth efficient node repair. Our bound is novel 
in that it can take into account the additional downloaded data 
at the eavesdroppers, and is tighter than the available bounds 
in the literature. Second, we present a secure, exact repairable 
coding scheme that has a higher code rate compared to that 
of IfTTl . Utilizing a special case of the obtained bound, we 
show our both codes achieve the optimal secure file size for 
any (£ij 2 ) when £ 2 < 2. 

Third, we shift focus to locally repairable regenerating 
codes. We derive an upper bound on the minimum distance 
of the vector codes, possibly non-linear, that satisfy a given 
locality constraint. We develop this bound using the proof 
technique used in fl2l . [13 0J- Fourth, based on maximal rank 
distance (MRD) codes, we construct a coding scheme which 
achieves this bound on minimum distance. Here, we establish 
a per node storage vs. resilience trade off similar to |[T3l . 
and study bandwidth efficiency in locally repairable DSS. We 
present a minimum distance optimal repair bandwidth efficient 
coding scheme. Finally, we consider the problem of providing 
secrecy against passive eavesdropper for locally repairable 
codes and present a secure locally repairable regenerating code 
for DSS modifying the aforementioned coding scheme. 

In all the scenarios we study in this paper, the achievability 
results allow for exact repair, and we obtain secure file size 
upper bounds from mincut analyses over the secrecy graph 
representation of distributed storage systems. Our main se- 
crecy achievability coding argument are obtained by utilizing a 
secret sharing scheme with MRD codes, similar to the classical 
work of 03] . 

The rest of the paper is organized as follows. In the next 
section, we provide a summary of related work to the problems 
studied in this paper. In Section II, we provide a general 
system model together with some preliminary results utilized 
throughout the text. In Section III, we reproduce a classical 
setup for the problem, and provide an enhanced upper bound 
on secure file size as well as a new secure coding scheme 
for minimum storage regenerating codes. In Section IV, we 
focus on locally repairable codes, providing new bounds on 
minimum distance of such codes. We also present a new 
coding scheme that achieves these bounds. In Section V, 
we present locally repairable codes with security constraints. 
Finally, we conclude the paper in Section VI. To improve the 
presentation of the paper, some of the results and proofs are 
relegated to appendices. 

C. Related Work 

In (9), Dimakis et al. characterize the information theoretic 
trade off between repair bandwidth vs. per node storage for 

'This also shows that the proof technique used in [ 14] based on generalized 
hamming weights, which only works for systematic codes, is not essential. 
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DSS satisfying the MDS ("any k out of n") property. Based 
on network coding results, functional repair is considered, 
and the life span of DSS, for a given set of node failures, 
is mapped to a multicast problem over a dynamic network. 
Using this mapping, the authors show that network coding 
based storage schemes achieve the lower bound on repair 
bandwidth allowing "functional repair" (9). Ifl6l and IfTTl 
present coding schemes that achieve the lower bound on repair 
bandwidth. The work in Ifl8l - ll20l devise low rate codes, 
which achieve the lower bound derived in (9l when data 
is downloaded from all surviving nodes during exact node 
repair. The coding schemes in lfl8l and lfl9l . l20l are tailored 
for k < 3 and k < ^, respectively. In ETI . Rashmi et al. 
design exact-repairable codes, which allow node repair to be 
performed by contacting d < n — 1 surviving nodes. These 
codes are optimal for all parameters (n, k, d) at the minimum 
bandwidth regeneration (MBR) point. At the minimum storage 
regeneration (MSR) point, these codes belong in low rate 
regime, as their rate is upper bounded by | + Recently, 
researchers have devised high rate exact repairable codes 
for the MSR point. Il22l presents codes for DSS with two 
parity nodes, which accomplish exact regeneration while being 
optimal in repair bandwidth. In fl23l and 1241 . permutation- 
matrix based codes are designed to achieve the bound on repair 
bandwidth for systematic node repair for all (n, k) pairs. 11251 
further generalizes the idea of 11241 to get MDS array codes 
for DSS that allow optimal exact regeneration for parity nodes 
as well. 

Towards obtaining coding schemes with "good" locality, 
Oggier et al. present coding schemes which facilitate local 
node repair in l26l . l27l . In lfl2l . Gopalan et al. establish an 
upper bound on the minimum distance of locally repairable 
linear scalar codes, which is analogous to singleton bound. 
They also show that pyramid codes, presented in ll28l . achieve 
this bound. Subsequently, the work by Prakash et al. extends 
the bound to a more general definition of locally repairable 
scalar linear codes |fl4~]. In |[T3l . Papailiopoulos et al. gener- 
alize the bound in |12| to vector codes, possibly non-linear, 
and establish per node storage vs. resilience trade off. They 
also present locally repairable coding schemes, which exhibits 
"k out of n" property at the cost of small amount of excess 
storage space per node. 

The problem of designing secure DSS against eavesdrop- 
ping has been addressed in iflOl . In |[T0l . Pawar et al. consider 
an eavesdropper, which can get access to the data stored on 
£ (< k) storage nodes of DSS, operating at the MBR point 
with "any k out of n" property. They derive an upper bound 
on the amount of data that can be stored on such a system 
without leaking any information to the eavesdropper, and 
present a coding scheme in the "bandwidth limited regime" 
that achieve this bound. Shah et al. consider the design of 
secure regenerating codes at the MSR point [fTTl as well. Since 
the amount of data downloaded for node repair at the MSR 
point is more than what is eventually stored on the repaired 
node, the eavesdropper may obtain more information if it is 
able to access the data downloaded when a node repair is in 
progress. Therefore, at the MSR point, the eavesdropper is 
modeled as accessing the data stored on £\ nodes and data 



downloaded during £2 node repairs (corresponding to distinct 
nodes), with £\ + £2 < k. Shah et al. present a coding scheme 
that achieves the bound on secrecy capacity in IflOl at the 
MBR point based on product matrix codes [21 J. They further 
use product matrix codes based solution for MSR point as 
well, which matches the bound in IflOl only when £2 = 0. 
Thus, the secrecy capacity for MSR codes is considered to be 
open when the eavesdropper is allowed to observe downloaded 
information. Moreover, the solution at the MSR point gives 
only low rate schemes as product matrix codes are themselves 
low rate codes. 

There is a closely related line of work on designing coding 
schemes for DSS that are resilient against active attacks, where 
an adversary is allowed to modify the content stored on a 
certain number of nodes through out the life span of the DSS. 
The goal of coding scheme is to allow successful decoding 
of the original data at a data collector even in the presence 
of erroneous data injected by the active adversary IflOl . |29l , 

ea. 

II. System Model and Preliminaries 

Consider a DSS with n live nodes at a time and a file f 
of size M over V q that needs to be stored on the DSS. In 
order to store the file f , it is divided into k blocks of size 
*Y each. Let (fi,...,fj-) denotes these k blocks. Here, we 

have fi € ¥ q k . These k data blocks are encoded into n data 
blocks, (xi, . . . , x„), each of length a over W q (a > The 
encoding process is summarized by the function 

¥lj —5- (Fg) . (1) 

Note that we don't restrict ourselves to linear class of func- 
tions. The function G may very well be a nonlinear function. 
Let C denote the codebook associated with the encoding 
function G. Given the codewords, node i in an n-node DSS 
stores encoded block x^. In this paper, we use x;, to represent 
both block Xi and a storage node storing this encoded block 
interchangeably. Motivated by the MDS property of the codes 
that are traditionally developed for data storag in centralized 
storage systems 13TI - II331 . the works on regenerating codes 
focus on storage schemes that have "any k out of n" property 
are designed and analyzed. 

Given this setup, as the network evolves over failures and 
repairs, we use the following notation to denote the contents 
and downloaded symbols of the nodes. The symbols stored at 
node i is represented by the vector Si, the symbols transmitted 
from node i to node j is denoted as d;j, and the set dj is used 
to denote all of the downloaded symbols to node j. DSS is 
initialized with the n nodes containing encoded symbols, i.e., 
Si = Xi for i — 1 , • ■ ■ , n. In the event of failure of i-th storage 
node, a new node, namely the newcomer, is introduced to the 
system. This node contacts to d storage nodes and downloads 
/? symbols from each of these nodes. The newcomer nodes 
use these d(3 number of downloaded symbols to regenerate a 
symbols, Xi, and store these symbols. This exact repair process 
preserves the MDS property, i.e., data stored on any k nodes 
(potentially including the nodes that are repaired) allows the 
original file f to be reconstructed. 
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We note that, for linear encoding schemes, the symbols 
of node i can be written as Sj = {f T g|, •■■ , f T gf}. In 
such a case, we refer to Si as the subspace spanned by 
the vectors {gj, ■ ■ ■ , gf}. For node repairs, using a similar 
notation, we consider node i to transmit symbols = 
{f T g i 1 , f T gf ,} to node j, where g\ ■ £ <Sj. We also refer 
to T>ij as the subspace spanned by vectors {g^-, • • • , gf .,}■ 
T>j then will be referred to as the subspace downloaded to 
node j, which will have a certain dimension in this subspace 
representation. For a given set of nodes A, we use the 
notation = {s,,i £ A}. A similar notation is adopted 
for the downloaded symbols, and the subspace representation. 
Throughout the text, we usually stick to the notation of having 
vectors denoted by lower-case bold letters; and, sets and 
subspaces being denoted with calligraphic fonts, [n] denotes 
the set {1,2,..., n}. 

A. Information flow graph 

In their seminal work |9|, Dimakis et al. models the 
operation DSS using a multicasting problem over information 
flow graph (see Fig. Q}. Information flow graph consists of 
three types of nodes: 

> Source node (S): Source node contains Ai symbols long 
original file f . The source node is connected to n nodes. 

• Storage nodes {{x™, x° ut )): Each storage node is repre- 
sented by a pair of nodes, input node .t 1 " and output node 
x° ut . Here, xf denotes the data downloaded by node i, 
whereas x° nt denotes the a symbols actually stored on 
node i. An edge of capacity a is introduced between x™ 
and x-° ut to enforce the storage constraint of a symbols 
per node. For a newcomer node, x" 1 is connected to 
x° ut node of d live nodes with links of capacity f3 
symbols each, representing the data downloaded during 
node repair. 

• Data collector nodes (DC;): Each data collector contacts 
x out no( j e G f jj ve noc [ es by the edges of capacity oo 
each. 

With the aforementioned values of capacities of various 
edges in the information flow graph, the DSS is said to employ 
an (n, k, d, a, (3) code. For a given graph Q and data collectors 
DC;, the file size that can be stored in such a DSS can be 
bounded using the max flow-min cut theorem for multicasting 
using network coding 11341 . 

Lemma 1 (Max flow-min cut theorem for multicasting |9l , 

El). 

Ai < min min maxflowfS — > DC;, Q), 

~ Q DCi 

where flow(S — > DCi,Q) represents the flow from the source 
node S to data collector DC; over the graph Q. 

Therefore, e.g., for the graph in Fig. Q] Ai symbol long 
file can be delivered to a data collector DC, only if the min 
cut is at least Ai. In (9), Dimakis et al. consider k successive 
node failures and evaluate the min-cut over possible graphs, 
and obtain the bound given by 



This bound can be achieved by employing linear codes, linear 
network code in particular. The codes that attain the bound 
in (O are known as regenerating codes J9)- Given a file 
size Ai, a trade off between storage per node a and repair 
bandwidth 7 = d(3 can be established from (|2}. Two classes 
of codes that achieve two extreme points of this trade off 
are known as minimum storage regenerating (MSR) codes and 
minimum bandwidth regenerating (MBR) codes. The former is 
obtained by first choosing a minimum storage per node (i.e., 
a = Ai/k), and then minimizing 7 satisfying (0, whereas the 
latter is obtained by first finding the minimum possible 7 and 
then finding the minimum a in (0. For MSR codes, we have: 



V^msr ; PmsrJ I , j 



M 



k ' k(d-k + l 
On the other hand, MBR codes are characterized by 
2Md 2M 



(o^mbr? /^mbr) 



k(2d- k + 1)' k(2d-k + l) 



(3) 



(4) 



fc-l 

M < ^min{(d-i)/3,a}. 



(2) 



For a given DSS with d < n— 1, it can be observed that having 
d = n — 1 reduces the repair bandwidth at both MSR and 
MBR points. Though the bound in (O is derived for functional 
repair, the bound and the achievability of MSR and MBR 
points are shown to be tight for exact repair as well. 

B. MRD codes 

Most of the encoding schemes presented in this paper use 
optimal rank-metric codes. An [N x m, g, <j] rank-metric code 
C is a linear code, whose codewords are N x to matrices over 
F g ; they form a linear subspace with dimension g of F^ xm , 
and for each two distinct codewords A and B, du(A, B) > 
where c?_r(-, ■) denotes the rank distance defined by 

d R (A,B) d =mnk(A- B) . 

For an [N x to, g, rank-metric code C we have g < 
min{iV(m - ? + l),m(N - q + 1)} l35l-ll37l. This bound 
is called Singleton bound for rank metric, and the codes 
that achieve this bound are called maximum rank distance 
(MRD) codes. A construction of MRD codes was given by 
Gabidulin 11361 . These codes can be seen as the analogs 
of Reed-Solomon codes for rank metric. A codeword in an 
[N x to, g, q] rank-metric code C, for m < N, can be 
represented by a vector c = [ci,C2, ■ ■ ■ ,c m ] over F 9 «. In 
the similar way as Reed-Solomon codes, Gabidulin codes 
can be obtained by evaluation of polynomials, however, for 
Gabidulin codes the special family of polynomials, called 
linearized polynomials, is used. A linearized polynomial f(y) 
over ¥ q N of ^-degree n has the form f(y) = ^™_ a iV q > 
where ai £ ¥ q N, and a n ^ 0. 

A codeword in Gabidulin code C is defined as c = 
(f{yi), f(V2), • ■ • , f(y m )), where f(y) is the linearized poly- 
nomial of g-degree m — <; with coefficients given by the 
information message, and y\ , . . . , y m 6 ¥ q N are linearly 
independent points over ¥ q ll36l . Note that evaluation of a 
linearized polynomial is an F g -linear transformation from ¥ q N 
to itself, i.e., for any a, b G ¥ q and yi,y2 £ F g », we have 
f{ayi + by 2 ) = af(y x ) + bf(y 2 ) 




Fig. 1: Information flow graph of DSS. Assuming that xi fails first the newcomer, x$ contacts {x2, £3, X4} during node repair. 
In the event of second node failure, X2, data is downloaded from {1E3, £4, 25} by the newcomer x§. 



C. Eavesdropper model 

In this paper, we consider the eavesdropper model defined 
in ifTTI . which generalizes the eavesdropper model considered 
in iflOl . In ifTOl , Pawar et al. consider a passive eavesdropper, 
who can access the data stored on £ (< k) storage nodes. 
The eavesdropper is assumed to know the coding scheme 
employed by the DSS. At the MBR point, a newcomer 
downloads a m b r = 7mt>i = d/3 m br amount of data. Thus, an 
eavesdropper does not gain any additional information if it is 
allowed to access the data downloaded during repair. However, 
at the MSR point repair bandwidth is strictly greater than 
the per node storage a msr , and an eavesdropper potentially 
gains more information if it is has access to data downloaded 
during node repair as well. Motivated by this, we consider 
an (£1,12) eavesdropper, which can access the stored data 
of nodes in the set E\, and additionally can access both the 
stored and downloaded data at the nodes in the set 82 with 
l\ = |£i| and £2 = {£2]- Hence, the eavesdropper has access 
to x° ut , 2™, x° ut for i £ £1 and j G £2- We summarize the 
eavesdropper model together with the definition of achievablity 
of a secure file size in the following. 

Definition 2 (Security against an (^1,^2) eavesdropper). A 
distributed storage system is said to achieve a secure file size 
of A4 S against an (£\J,2) eavesdropper, if for any sets £\ 
and £2 of size £\ and £2, respectively, J(f s ;e) = 0. Here f s 
is the secure file of size M s , which is first encoded to file f of 
size M, and e is the eavesdropper observation vector given 
bye^ {x° ut ,xf,x° ut : i G £ u j G £ 2 }- 

Note that, this definition coincides with the {£,£'} secure 
distributed storage system in IfTTI . where £ = £\ + £2 and 

£' = £2. 

In MSR coding schemes with high rate the number of 
parity-check nodes is negligible relatively to the number of 



systematic nodes. Hence in the following we consider the 
codes with optimal exact repair of systematic nodes, and we 
assume also that £2 is contained in the set of systematic nodes. 

We remark that, as it will be clear from the following 
sections, when a file f of size M. is stored in DSS and the 
secure file size achieved is A4 S , the remaining M. — M. s 
symbols can be utilized as public data, which does not have 
security constraints. Yet, noting the possibility of storing the 
public data, we will refer to this uniformly distributed part 
as the random data, which is utilized to achieve security. 
Throughout the text, we use the following lemma to show 
that the proposed codes satisfy the secrecy constraints. 

Lemma 3. Consider a system with information bits u, random 
bits r (independent of u), and an eavesdropper with observa- 
tions given by e. If H(e) < H(r) and H(r\u,e) = 0, then 
l(u;e)=0. 

Proof: See Appendix lAl ■ 

D. Locally repairable codes 

First we present a general definition of the minimum dis- 
tance of a code, and then we give an equivalent formulation of 
it, which will be used in the following sections in the sequel. 

Definition 4 (Minimum distance of a code). Let C denotes 
a set of nodes that get erased. For a code associated with 
encoding function G, as defined in (0, its minimum distance 
d m j n is defined to be the cardinality of the smallest set C rn , 
for which we have 

H (x [n] \x £m ) = H (x n , . . . ,x 4nHz . m| ) < M. (5) 

Here {h, ■ ■ ■ ,i n -\S\} = [n]\C m . 

According to an alternate definition for rf m i n , as given in 
lfl2l for scalar linear codes and later extended by JT3| for 
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general codes, 

d m i a = n — max \A\ (6) 

AC[n]:H(x A )<M 

where A = {h, ■ ■ ■ ,i\A\} ^ VA and X A = ( x u > • • ■ > x i^, )■ 11 
follows from the definition of d m j„ that a data collector can 
reconstruct the original data, i.e., f, by contacting any set of 
n — d m - m + 1 storage nodes in the DSS. We are interested in 
ensuring this property of the DSS during its entire life span 
despite of its dynamic nature due to node repairs. Besides this 
in locally repairable DSS, we are interested in coding schemes, 
i.e., C, that have following property: 

(r, S) locality: For each stored block Sj (of length a), there 
exists a set of nodes T(i) of size at most r + S — 1 such that 
all elements of T(i) have following two properties: 

> Any set of r nodes in T(i) are independent, i.e., for any 
{ill • • • ,jr} Q r(i), we have 

H(8j 1 ,...,8 jr )=ra (7) 

• Each element j G T(i) can be written as a function of 
any set of r elements in T(i) (not containing j). In other 
words, minimum distance of C\r(i), the code obtained by 
puncturing C over T(i), is at least 8. 

Codes that satisfy this property are called (r, 5, a) locally 
repairable codes. 

III. Secrecy in repair bandwidth efficient DSS 

Considering that the eavesdropped nodes may not carry 
secure information to the data collectors in the bound given 
by ©, iflOl establishes the following upper bound on the 
secure file size when the eavesdropper observes the content 
of £ nodes. 

k 

M s < ^ miri{( rf -* + 1 )/ ? :a}- (8) 

Pawar et al. show that this bound is tight in the bandwidth 
limited regime, 7 < T = (n — l)a with d = n — 1, by 
presenting a coding scheme that is secure against the passive 
eavesdropper observing £ storage nodes. This point essentially 
corresponds to MBR point (see (O) when a data collector 
contacts all the remaining nodes. flTTI proposes product matrix 
based secure coding schemes achieving this bound for any £ 
at the MBR point. However, the coding scheme proposed in 
OTI can only store a secure file size of (k — £\ — £-i)(ol — £2$) 
at the MSR point. At the MSR point, the bound in © reduces 
to 

M s < (k-£ 1 -£ 2 )a. 

From these, it is concluded in ifTTI that the proposed scheme 
achieves secrecy capacity only when £2 = 0. This corresponds 
to the scenario for which the eavesdroppers are not allowed 
to observe downloaded packets. This leaves the following 
questions open: 

• Can bound (0 be further tightened for MSR point? 

• Is it possible to get a secure code at the MSR point 
that outperforms the performance of the code proposed 
in OH? 



In this section, we answer both questions affirmatively. We 
first derive a generic upper bound on the amount of data 
that can be securely stored on DSS for bandwidth efficient 
repairable codes at the MSR point, which also applies to 
bandwidth efficient exact repairable code. Next, we prove a 
result specific to exact repairable code for d = n — 1, which 
allows us to provide an upper bound on the file size that can 
be securely stored on a DSS against an (£1, ^-eavesdropper. 
This bound is tighter than a bound that can be obtained from 
the generic bound we provide. We subsequently combine the 
classical secret sharing scheme due to lfT31 with an existing 
class of exact repairable MSR codes to securely store data in 
the presence of an (^1,^2) eavesdropper. We show that this 
approach gives a higher rate coding scheme compared to that 
of IfTTI and achieves the secrecy capacity when £2 < 2 for any 
4- 

A. Improved bound on secrecy capacity at the MSR point 

In order to get desired bound, we rely on the standard 
approach of computing a cut in information flow graph 
associated with DSS. We consider a particular pattern of 
eavesdropped nodes, where eavesdropper observes content put 
on £\ initial nodes and data downloaded during first £2 node 
failures that do not involve already eavesdropped £\ nodes. 
Using the min cut-max flow theorem, this case translates into 
an upper bound on the secrecy capacity for any MDS encoding 
scheme that operates on MSR point (see (O), one extreme of 
the repair bandwidth vs. per node storage trade off defined in 
©. 

Theorem 5. For a bandwidth efficient repairable (n, k) MDS 
code, we have 

M s < £ fa -dim fczWijl < 9 > 

Proof: Consider Fig. [2] which describes a particular 
case that may arise during the lifespan of a DSS. Here, 
xi,X2, ■ ■ ■ ,x n represent the original n storage nodes in DSS 
as defined in Sec. [D] Assume that nodes Xk-e 2 +i, ■ ■ ■ , Xk fail 
subsequently in the order specified by their indices. These £2 
failures are repaired by introducing nodes x n +±, . . . , x n +e 2 
in the system following a node repair process associated 
with the coding scheme employed by the DSS. Consider 
£1 = {xi, . . . , xi x } as the set of £\ nodes, where eavesdropper 
observes the stored content, and £2 = {x n +i, . . . ,x n +e 2 } 
be the set of nodes which are exposed to the eavesdropper 
during their node repair, allowing eavesdropper to have access 
to all the data downloaded during node repair of set £ 2 . 
Let 1Z denote the set of fc — (£\ + £2) remaining original 
nodes {x^+i, . . . , Xk-e 2 }, which are not observed by the 
eavesdropper directly, and information stored on these nodes 
may leak to eavesdroppers only when these nodes participate 
in node repair. Assume that a data collector contacts a set of 
k nodes given by K, = £ 1 U £ 2 U 1Z in order to reconstruct the 
original data. For a file f s to be securely stored on the DSS, 
we have 

H(i s ) = H(f s \s £l ,d £2 ) (10) 
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Fig. 2: Node repair in the presence of (£1,^2) passive eaves- 
dropper. 



= ff(f> £l ,d f2 )-iJ (f 8 1 s £l , s £2 , s n ) (11) 

< if (f s | S£l , d E2 ) - H(f s \s £l , d £2 , s K ) 
= ^(f s ;s TC |s £l ,d £2 ) 

< H(s-ji\s £l ,ds 2 ) 

< H(s n \d £2 ) 

k-e 2 

= ff ( S il S ^i+l''--' S i-l> d £2) 

< -ff (si|dj in+ i, . . . , d it n + g 2 ) 
i=ti+l 

k-l-2 / / £2 \ \ 

< U-dim £p,>+j (12) 
»=«i+x \ v=i / / 

Here ( fTOb follows from the fact that coding scheme em- 
ployed in DSS is secure against an (£±,£2) eavesdropper, i.e., 
7(f s ; Sfl) d £2 ) = H(f s ) - H(f s \ S£l ,d £2 ) - 0. O is a 
consequence of MDS property of the code, i.e., the original 
data can be recovered from data stored on any set of k nodes. 

■ 

In Theorem |5J dim Q^jli ^n+j) can ^ e trivially lower 
bounded by f3 to obtain the following corollary. 

Corollary 6. For a DSS employing an (n,k,d,a, (3) MSR 
regenerating code, we have: 

M s < (k-£i -£ 2 )(a-/3). (13) 

This shows that the secure code construction proposed in 



IfTTI is optimal for 1% = 1. 

The following lemma is specific to exact repairable linear 
codes at the MSR point that employ interference alignment 
for node repair with d = n — 1. It is shown in ||39l that 
interference alignment is a necessary component of an exact 
repairable linear scalar (/? = 1) MSR code. The necessity of 
interference alignment holds for (3 > 1 as well. Therefore, the 
following bound is fairly general and apply to all known exact 
repairable codes at the MSR point. Following the standard 
terminology in DSS literature, each node i, has f3 x a repair 
matrices, {Vij}, associated with remaining nodes j ^ i. In 
the event of failure of node j, a newcomer downloads V^jXi 
from every node i, i ^ j. In rest of the section, we use Vi 
to denote both a matrix and row-space of the matrix. 



Lemma 7. Consider an (n, k)-DSS storing data in a sys- 
tematic form with (n — k) linear parity nodes. Assume that 
d = n — 1, i.e., all the remaining nodes are contacted to repair 
a failed node. Let Vij be the repair matrices associated with 
node i, which is used to perform interference alignment based 
node repair for node j. Then for each i £ [k], i.e., systematic 
nodes, we have 



1 n y ^ 



= rank | (^| Vij 



< 



{n-k)\ A \ 



(14) 



where A C [k]\{i}. 

Proof: See Appendix 151 ■ 

It follows from the well-known dimension formula for 
vector spaces that 

dim(2\ n+1 + 2\ n+2 ) 

= dim (I\„+i) + dim {V^ n+2 ) - dim (V i<n+1 n 2\„ +2 ) 
= /3 + /3 -dim (Pj^nPj.n+j) 



> 2/3 



(15) 



(n — k) 2 ' 

where ( fT3T > follows from Lemma [7] Now combining ( fT31 ) with 
Theorem |5j we get the following corollary: 

Corollary 8. Given a bandwidth efficient repairable (n, k) 
MDS code with d = n — 1 that employs interference alignment 
to perform node repair, for £2 < 2 we have 



M s < (k - 4 - £ 2 ) (a-K (a, /3, £ 2 )) 



where 



2/3 



if £ 2 = 1 
if £ 2 =2 



(16) 



(17) 



B. Construction of secure MSR codes for d = n — 1 

In this subsection we present a construction which is based 
on concatenation of MRD codes Il35l - ll37l and optimal repair 
MDS array codes, called zigzag codes 11241 . 11251 . The con- 
struction of (n, k) zigzag code is given in (25]. Let p = n — k. 
Then, this construction provides a/'xn array with a p k x k 
systematic part. The repair of a systematic node (column) j is 
performed by accessing rows Yj = {x £ [0,p k —l]:x- e 3 = 
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0}, where e 3 is an element of the standard basis for it, and 
x is represented with an element of Z k . 

We first state the following property of this repair process. 

Lemma 9. Assume that an eavesdropper gains access to the 
data stored in l\ nodes and the data stored as well as the 
data downloaded during node repair in £2 systematic nodes 
in a (k + p, k) zigzag code. Then the eavesdropper can only 
observe 

k P k - P k (k-£ 1 -e 2 ) ( 1-- 
V p 

systematic symbols. 

Proof: First note that 

\Yi\=P k - 1 
\Y l r\Y 3 \=p k - 2 , fbri^j, 

and in general 

\Y h n Y i2 . . . n Y it I = p k ~\ for i x ± i 2 ± ■ ± i t . 

Let 82 C [k] be the set of size £2 of systematic nodes 
(columns) where an eavesdropper has access to the stored data 
and to the downloaded during node repair data. Then by using 
inclusion-exclusion principle, we have 



U je£2 Y j \=£ 2 -p k - 1 - 



■P 



k-3 



k-l 



= (-/-^)((p-l)^-^) 

= p k -p k -^{p-lf\ 

Then, the eavesdropper can observe 

p k {h + 4) + (k-£i- 4)| U f e£2 Yj\ 
= /(4+4) + (fe-4-4)(/ 

= k P k -p k (k-h-h) f 1 — i 



(p-l) fa ) 



systematic symbols. ■ 

We now detail the achievability scheme of this section. Let 
[N x ka, Nka, 1] be a Gabidulin MRD code, N > ka, with 

a = p k J36). Let f(y) = £ ay*?, c, G F,«, be the 

i=0 

corresponding linearized polynomial, i.e., the coefficients of 
this polynomial are chosen as the information symbols, and a 
codeword of length ka (over F 9 w) is obtained by its evaluation 
in ka linearly independent (over F g ) elements of ¥„n. 

Secrecy achieving encoding of the data will be performed as 
follows. First, we choose kp k —p k (k—£i—£ 2 )(l~j;y 2 random 
symbols over F„w and consider them as the largest coefficients 



of the encoding polynomial. Then, we choose the remaining 
p k (k— 1\— 4)(1 — ^) £2 coefficients of the polynomial using the 
symbols of the secure file. The result of this MRD encoding 
will be encoded by using a (k +p,k) zigzag code. Note that 
since the evaluation of /(■) is a F 9 -linear function, all the 
symbols in the parity-check nodes of the final code are given 
by the evaluation of /(•) in the linear combinations of the 
evaluation elements of the systematic nodes. This property of 
the constructed code will be called a linearized property. 
This code achieves the following secure file size. 

Theorem 10. The secure code obtained by MRD secrecy 
precoding of a zigzag code at the MSR point with a = p k 
achieves a secure file size given by 



(k-h- £ 2 )p k 1 - 



1 



where p = n — k, for d = n — 1. In addition, for any {£\, £2) 
such that £2 < 2, this code attains the upper bound on the 
secure file size given in Corollary [S] and achieves the secrecy 
capacity at the MSR point with d = n — 1. 

Proof: The repair and data reconstruction properties of 
the proposed code follow from the construction of zigzag 
codes J24), l25ll . The proof of security follows by Lemma [9] 
Lemma [3] and the linearized property of the code. (We note 
that a similar proof of security when utilizing polynomials for 
encoding is provided in the seminal paper of A. Shamir on 
secret sharing [Q3].) 

Substituting ^2 = 1 (or 2), a = p k and /3 = = p k ~ x in 
(fT6b shows that the proposed code construction achieves the 
upper bound on secure file size, specified in Corollary [8] for 
£2 < 2. ■ 



IV. New Bounds and Constructions for Locally 
Repairable Codes 

In this section, we study the notion of local repairability 
for DSS. As opposed to the line of work on scalar locally 
repairable codes lfl2l . |[T4| . l28l . where each node stores a 
scalar over a field from a codeword, we consider vector locally 
repairable codes, which have previously been considered in 
|[T3l , l27l . Furthermore, in addition to the vector construction, 
the (r, S, a) codes we consider, as defined in Section [II] allow 
for the possibility of a > Ai/k, and non-trivial locality, i.e., 
the possibility of 5 > 2. Thus, these codes are generalizations 
of vector locally repairable codes given in |[T3l . which con- 
sidered only the 5 — 2 case. We note that we are particularly 
interested in vector locally repairable code with multiple local 
parities. Among other advantages, codes having multiple local 
parities exhibits a stronger resilience to eavesdropping. In 
particular, as detailed in Sec. [V] both scalar locally repairable 
codes and vector locally repairable codes with single local 
parity have poor secrecy rate in the presence of a passive 
eavesdropper. 

We first derive an upper bound on the minimum distance 
of (r, S, a) codes, which also applies to non-linear codes. We 
follow the proof technique of |[T2l . |[T3l , which is given for 
the single local parity case, and modify it for multiple local 
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parity nodes. The bound derived in this section gives the bound 
presented in (14\ as a special case without the assumption of 
having a systematic code. As noted in |fl3l , the bound on 
d m i n establishes a resilience vs. per node storage trade off, 
where per node storage a can be increased over A4 /k to obtain 
higher d m i n . This is of particular interest in the design of codes 
having both locality and strong resilience to node failures. 

Next, we propose a general code construction which 
achieves the derived bound on d m i n . We use MRD codes 
along with MDS array codes to obtain this construction. In this 
section, we further introduce the notion of repair bandwidth 
for locally repairable codes and obtain an upper bound on the 
amount of data that can be stored in the DSS while supporting 
a given repair bandwidth. We note that the idea and analysis 
of repair bandwidth is similar to the classical work in the area 
of repair bandwidth efficient code 0. Here, the presence of 
multiple local parity nodes can be utilized to repair a local 
node efficiently by contacting more than r nodes from the 
same group. The notion of bandwidth efficient node repair 
within a local group becomes important in Sec. [V] where we 
study the locally repairable codes under secrecy constraints. 



for an (r, 5, a) locally repairable 



A. Upper bound on d n 
code 

We state a generic upper bound on the minimum distance 
dmin of an (r, 5, a) code C. (The definition of e? m j n is provided 
in Section UD) This will establish a trade off between node 
failure resilience (i.e., d m ; n ) and per node storage (a). 

Theorem 11. Let C be an (r, i5, a) locally repairable code 
over Then, it follows that 



x(C) < n 







~ M~ 


-0 






a 




VOL 





(S-l). (18) 



Proof: In order to get the aforementioned upper bound on 
minimum distance of an (r, 5, a) locally repairable code, we 
utilize the dual definition of minimum distance of a code as 
given in ©. Similar to the proof in lfl2l and 0~3), we construct 
a set A C [n] such that 

H(s A ) < M. (19) 

This along with (|6]l give us an upper bound on d m [ n (C). 

The construction of a set A is given in Fig. [3] Next, we 
show a lower bound on the size of the set A, output of the 
algorithm described in Fig. [3] Note that at each iteration of 
the while loop in Fig. [3] the algorithm increases the size of 
the set Ai-i by at most r + 5 — 1 to get A%. For each i, define 



a, 



lA-il 



(20) 



and 



h i = H(s Ai )-H(s Ai _ 1 ). (21) 

Assume that the algorithm terminates at (£ + l) th iteration, 
i.e., A = At. Then it follows from (|20]l and (E) that 



|.4| = \A e \ =X! a " 

e 



(22) 
(23) 



A-i s.t. \T(ji)\Ai-i\ > 



Set A = and i = 1. 
while fl'(s^ i _ 1 ) < M do 

Pick a coded block Sj i 

5-1. 

if H(s Az _ 1 , s r (^)) < M then 

set Ai = Ai-iUT(ji) 
else if i/Xs^^srQ-.)) > M and 3B c T(ji 
H(s At _ 1 ,s B ) < M then 

set A = Ai-i U B 
else 

i = i + 1, end while 
end if 

i = i + l 
end while 

Output: A = Ai-i 



s.t. 



Fig. 3: Construction of a set A with H(s A ) < A4 for an 
(r, 5, a) code. 



Consider two cases depending on the way the algorithm in 
Fig. [3] terminates: 

Case 1: Assume that the algorithm terminates with the final 
set assigned at step 5, i.e., after adding T(j() to At-x- Now 
we have from (r, 6, a) property of the code that 

hi = H(s Ai ) - HisA^) 

= ^( s yti_iu(yti\^_i)) - H(s Ai _ ± ) 

= flXs^J +H(s Ai \ Ai _ 1 \s Ai _ 1 ) - H(s Ai _ 1 ) 

< (a, - 8 + l)a. (24) 

The last inequality follows from the fact that any block in T(ji) 
can be written as a function of any set of r-blocks in T(ji) and 
the fact that we pick i in step 3 only if |r(jj)\A-i| > 8 — 1. 
Since at the end of i th iteration, we have all the elements of 
T(ji) added to Ai, out of which blocks are added at the 
i th iteration. These newly added packets can not contribute 
more than (a; — (8 — l))a to the entropy of set Ai as 5 — 1 of 
these packets are deterministic function of other newly added 
blocks of T(ji) and blocks of T(ji) that were already present 
in Ai-x- From d24b , we have that 



di > 



1. 



(25) 



Now using d22b 

l 

\A\ = \A t \ = J2> 



> 



+ 5-1 



E 

-S2h l + (5-l)£ 



2 = 1 



(26) 
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Similar to the proof of Papailiopoulos et al. |[T3l , we have 

~M 



and 



M 



It follows from d26j, (t27T >. and d28]i that 



~M~ 


M 


~ M~ 


-o 






a 




ra 





1 (5-1). 



(27) 



(28) 



(29) 



Case 2: The proof of this case is exactly similar to that 
in lfl2ll except a few minor modification. Consider that the 
algorithm terminates with the final set assigned at step 7 in 
£ th iteration. Since it reaches the step 7, we have 



H(s Ae _ lUr{jt) ) > M. 



(30) 



As the increment in the entropy is at most ra at each iteration, 
we have 

~M~ 



> 



ra 



For i < £ — 1, from 



For i = I, 



hi , 
<H> — +6-l. 
a 



> 



a 



(31) 



(32) 



(33) 



Next, it follows from <|22}, (ED, d32t . and CO} that 



\Ai 



> 



J2 a i 

i=l 

t— 1 x 

i f 



Q 



> - 









~ M~ 


-0 


K 










a 




ra 



















-0 


a 









1 (£ - 1J34) 



(35) 



where ( f34T > follows from (t3TT > and d27l i. Now combining ©, 
J, and d35l l, we get 



rfmin(C) <Tl — 



~M~ 






-0 






a 




ra 





-1 (5-1). (36) 



Using a = (l + e)4p for the bound given in the above theo- 



B. Construction of d m i n -optitnal locally repairable codes 

In this subsection we present a construction of an (r, S, a) 
locally repairable code which attains the bound given in 
Theorem QT] Consider a file f, to be stored on DSS, of size 
M > ra. We encode the file in two steps before storing it 
on DSS. First, the file is encoded using an MRD code. The 
codeword (over F ? n) of the MRD code is then divided into 
local groups and each local group is then encoded using an 
MDS array code over ¥ q . This construction can be viewed 
as a generalization of the construction proposed in 11301 . In 
particular, let C MRD be an [TV x to, NM,s = m - M + 1] 
Gabidulin MRD code, N > in, where each codeword is 
considered as a vector of length m over F„jv. We take 
to = gra, where g denotes the number of local groups in the 
system, which is a system parameter. A codeword c G C MRD 
is partitioned into g groups, each of size ra, and each group 
is stored on a different set of r nodes, a symbols per node. 
In other words, the output of the first encoding step generate 
the encoded data stored on rg nodes, each one containing a 
symbols of a (folded) MRD codeword. In the second stage of 
encoding process, we generate (5—1 parity nodes per group 
by applying an (r+6— l,r) MDS array code over ¥ q on each 
local group of r nodes, treating these r nodes as input data 
blocks for the MDS array code. At the end of second round of 
encoding, we have n = (r + S — l)g = — + —(6 — 1) nodes, 
each storing a symbols over ¥ q N, partitioned into g local 
groups, each of size r— 5+1, We denote the concatenated code 
by C loc . Next, we prove that the proposed locally repairable 
code C loc indeed has the maximum minimum distance as 
given in (fT8l . 

Theorem 12. The proposed (r, 6, a) locally repairable code 
C loc attains the bound l\18h i.e., its minimum distance 
^min(C loc ) satisfies 



dmi„(C* loc ) =n- 



~M~ 




~M~ 






-( 




-0 


a 




ra 





Proof: Recall that a codeword of a Gabidulin MRD code 
can be considered as an evaluation of a linearized polynomial 
f(y) 6 F 9 «[j/] on to linearly independent points over F g , 
{yi, . . . ,2/ m }, where yi G ¥ q N, 1 < i < m. The polynomial 
/(•) has original data symbols that need to be encoded as 
its coefficients. Note that for reconstruction of the original 
data it is sufficient to have evaluations of /(•) on A4 points 
in ¥ qN , {f(pi),.-.,f(pM)}, su ch that {pi,...,Pm} are 
linearly independent points over ¥ q . (See, e.g.. 

E3.) 



rem, we obtain d„ 



< n- 



l+e 



k 



r(l+e) 



For the special case of 8 = 1, this bound matches with the 
bound in J9). For the case of a = A4/k, i.e., the minimum 
storage point for locally repairable codes, the bound reduces 
to d nlin < n — k + 1 + ( \k/r \ — 1 ) ( 5 — 1 ) , which is coincident 
with the bound presented in 03). 



I j (<5—l). Utilizing F 9 -linearity property of f(y), MDS property of 
array code used in the second encoding stage, and the fact 
that M. > ra, we have that any r nodes in any group contain 
evaluation of f(y) at ra linearly independent over ¥ q points. 

Let i and j be two integers such that A4 = m — ra(i + l)+j, 
< i < — — 1. and < j < ra — 1. Now it follows from 

— — riy — J — 
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£00 that 



i(C loc ) 



m to , „ 
1 < - + — (6- 
a ra 



Q 



~M~ 








a 







a 



m 
ra 



M 



ra 



- 1 



(5-1) 



(38) 

Next, we treat j — and j > cases separately and show 
that C loc has optimal minimum distance in both these cases. 

Case 1 (j = 0): In this case [M] = _ r (i + 1) and 
^1 = ?k ~ (* + !)■ From <E1 we" have, ° 



l<r(* + l) + (i + l)(<S-l) + (<y-l) 
= (z + l)(r + 5-1) + (5-1). 



(39) 



Now, we show that any (i + l)(r + S — 1) + (6 — 1) node 
erasures can be tolerated by C loc . In other words, even after 
(i+l)(r+5— 1) + (5— 1) erasures, we have evaluations of f(y) 
at .M linearly independent points over ¥ q . Here, we point out 
that the worst case erasure pattern is when the erasures appear 
in the smallest possible number of groups and the number 
of erasures inside a local group is maximal. Therefore, we 
consider the case when all the symbols in i + 1 groups are 
erased, and there is a group with 5—1 erased nodes. Due 
to application of MDS array code in each local group, less or 
equal to 5 — 1 erasures in a particular group does not affect the 
number of evaluations of f(y) on linearly independent points 
that particular group has to offer, i.e., ra. So in this case, the 
number of the remaining symbols of an MRD codeword which 
correspond to linearly independent points is m— (ra(i + 1)) = 
M. 

Case 2 (j > 0): In this case [M] = a _ r (i + 1) + [|] 
and [Ml = ™ _ U + i) + \J-f = ^ -i.lt follows from 

I ra I ra V ' / I ra I ra 

that 



rfmin - 1 = r(i + 1) - 



(i + l)(r + 5-l)- 



(40) 



As in the previous case, we show that original data can be 
reconstructed even after the failure of any (i + l)(r + 6— 1) — 
[^] nodes. We again establish this by showing that we can 
find evaluations of f(y) at M. linearly independent points from 
the remaining nodes in the DSS. As previously, we consider 
the worst case erasure pattern, where the erasures appear in the 
smallest possible number of groups and the number of erasures 
inside a group is maximal. Assume that all the symbols in i 
local groups are erased, and there is a local group with r + 
5 — 1 — [— 1 erased nodes. In this case the available number 
of evaluation of f(x) at linearly independent points is 



m — rod + (r + 5 — 1) — ( r + S — 1 



a 



to — rai + 



a> M 



(41) 



1 


CL\ a>2 &Z Ct4 




6 


61 b 2 63 64 


11 


c r c 2 c 3 c 4 


2 


a 5 a 6 a 7 a 8 




7 65 k b 7 b g 


12 


C 5 C 6 C 7 Cg 


3 


A9 fliQ CL\\ CLyi 




8 


69 610 hi hi 


13 


eg cio cu c 12 


i 


Pi vl v% v\ 




9 


p\ p'i P \ P \ 


14 


Pi Pi P3 PS 


5 


Pi Pi Pt Pi 




10 


p\ Po Pi p\ 


15 


P5 Pe Pt Ps 


















local group 1 






local group 2 




local group 3 


Fig 


4: Example of an (r 


= 3,5 = 3, a = 


4) locally repairable 



code with n = 15 and .M = 26. The code has minimum 
distance 5. 



Therefore, the original data can be recovered even when (i + 
l)(r + 5 - 1) - [£] nodes fail. 

This establishes the optimality of C loc in terms of minimum 
distance. ■ 

Next, we illustrate the construction of C loc with help of an 
example. 

Example 13. Let us consider a DSS with Ai = 26, 5 = 
r = g = 3, a = 4, to = rga = 36. 77jen n = 15 anc/ 
from^Wi, d min < 5. Let (01, . . . , a i2 , h, . . . , 612, Ci, . . . , c i2 ) 
£>e a codeword of an [N x 36, iV • 26, 11] MRD code, which 
is obtained by encoding A4 = 26 symbols over W q N of 
the original file. Here we assume that N > 36. 77ie MRD 
codeword is then divided into three groups (01, . . . , 012). 
(61, . . . , 612), a«ii (ci, . . . , C12). Encoded symbols in each 
group are stored on three storage nodes as shown in Fig. [4] In 
the second stage of encoding, an MDS array code is applied 
on each local group to obtain 5—1 = 2 parity nodes per local 
group. The coding scheme is illustrated in Fig. 

Note that, any three nodes in a local group provide evalu- 
ations of the linearized polynomial f{y) associated with data 
symbols at 12 linearly independent points over ¥ q ; and, the 
polynomial f(y) can be recovered from its evaluations at 26 
linearly independent points over ¥ q . Here, we illustrate that 
any four node erasures can be tolerated by the coding scheme 
employed in this example. If there are at most two erasures in 
a group, then we can obtain evaluation of f(y) at 12 linearly 
independent points from each local group, thus 36 > 26 = M. 
points from all three local groups. If there is a group with 
three node erasures, then this group can provide evaluations 
of f(y) at only 8 linearly independent points. However, the 
other two groups can give evaluation of f(y) at 24 additional 
linearly independent points, which makes the total numeber of 
desirable evaluation to be 32 > 26. Finally we consider the 
worst case mentioned in the proof of Theorem |72] Suppose 
there is a group with four erased nodes, then this local group 
provides evaluation of f(y) at 4 linearly independent points, 
which taking into account the contribution from other two 
local groups (additional 24 points), gives the evaluation of 
f(y) at 28 > 26 = A4 linearly independent points. Therefore, 
the original file can be reconstructed even after four nodes 
fail. 



12 



C. File size upper bound for repair bandwidth efficient locally 
repairable codes 



In this subsection, we introduce the notion of repair band- 
width for locally repairable codes. In a naive repair process for 
a locally repairable code, a newcomer contacts r nodes in its 
local group and download all the data stored on these nodes. 
The newcomer then regenerates the data stored on the failed 
node and stores it for future operations. Following the line of 
work of bandwidth efficient repair in DSS due to J9), we allow 
a newcomer to contact more than r nodes in its local group in 
order to repair the failed node. The motivation behind this is to 
lower the repair bandwidth of a locally repairable code. (This 
also improves the secrecy capacity of such codes as detailed 
in Section |V]) 

In the rest of this section, we restrict ourselves to locally 
repairable codes that have the maximum possible minimum 
distance as described in (TT~8b . Since the upper bound on mini- 
mum distance for locally repairable codes in (TT~8T > is achievable 
by only those codes that have disjoint local group, we focus 
only on such codes. Here, we assume that (r + 5 — l)\n. 
Let Qi, . . . ,Q g denote g = r+ $_ 1 disjoint sets of indices of 
storage nodes, each of size (r + S — 1). Each set represents 
a local group, and a failed node in a particular local group is 
repaired by contacting d remaining nodes within that group, 
where r<d<r + 5 — 2. During the node repair process a 
newcomer downloads /3 symbols from each of these d nodes. 

Next, we perform the standard min-cut max-flow based 
analysis for locally repairable DSS by mapping it to a multi- 
casting problem on a dynamic information flow graph. (The 
information flow graph representing a locally repairable DSS is 
a modification of the information flow graph for classical DSS 
analyzed in |9) and is first introduced in lfl3l for naive repair, 
where the newcomer contacts r nodes.) We assume a sequence 
of node failures and node repairs as shown in Fig. [5] We 
consider that each local group encounter the same sequence of 
node failures and the node repairs that are performed as result 
of these failures. Each data collector contacts n — d m i B + 1 
storage nodes for data reconstruction. A data collector is 
associated with the nodes it contacts for data reconstruction, 
(/Ci, K,2, ■ ■ ■ , ICg), Here /C{ C Qi is the set of indices of 
nodes that a data collector contacts in i th local group and 
Y^i—i |/w| = n — d m i n + 1. Next we derive an upper bound 
on the amount of data that can be stored on the DSS while 
ensuring n — d m i n + 1 property, i.e., each set of n — d m - m + 1 
nodes allows a data collector to recover the original file. This 
upper bound is used to derive a repair bandwidth vs. per node 
storage trade off for minimum distance optimal codes with 
(r, 5, a) locality. In what follows, we add two more parameters 
in the representation of locally repairable codes and denote 
them by the tuple (r, S, a, d, (3). 

Theorem 14. For an (n, k) DSS employing an (r, 6, a, d, /?) 
locally repairable code, we have 



( h-l \ 
M < min < ra, min{max{(<i - i)(3, 0}, a} > (42) 

[" >-+™-" + 1 J (- r+S-2 \ 

+ min < m, min{max{(d — i)/3, 0}, a} > 

j=i I i=0 ) 



where h = n — drain + 1 — (r + S — 1) 



r+b-l 



Proof: Consider a data collector with K\ 

. . , /C I n -d i +1 I = Q\ n-d mi „ +1 | , /C | n-d„ 



Ql,K,2 



r+S-1 



r + S-1 



+ ■2 



/C 



and /Ci n -d m 



s.t. 



r+S-1 



1C\ 



F+S- 



h. 



Now, the bound in d42b follows by 

inding various cuts in information flow graph (Fig. |5). For 
each group, we consider cuts similar to the ones given in (9J- 
Here, the data collector connects to h nodes for the first term 
in d42b and r + 5 — 1 nodes for each of the terms in the 
summation of the second term in d42b . Now, consider the z-th 
node out of k nodes that data collector connects in a particular 
group. (Here, k = h, or k = r + 5 — las described above.) A 
cut between x™ and x° ut for each node gives a cut-value of a. 
On the other hand, for i = 0, • • ■ , k — 1, if the cut is such that 
xf 1 belongs to the data collector side, we consider that {d — i) 
live nodes are connected together with i nodes that have been 
previously repaired. In our setup, for such a cut, the cut-value 
evaluates to max{(d — i)j3, 0}, as for i > d the repair node is 
considered to contact only the previously repaired nodes, and 
hence does not contribute to the maximum flow. ■ 
Note that the codes that are under consideration have 
property that each local group has entropy of ra and any 
set of r nodes has ra independent symbols. (See definition of 
(r, 5)— locality in Section HH) Therefore, node repairs within 
each local group have to ensure this property. This implies 
that each local group and its repair can be related to an 
(r + 6 — 1, r, d, a, /3) MSR regenerating code with a file of 
size ra. Hence, when a collector connects to any r nodes in a 
group, it can get all the information that particular group has 
to offer. Therefore, similar to the analysis given in (9| for the 
classical setup, the parameters need to satisfy 



ra = min{(d — i)/3, a}, 



(43) 



i=0 



which leads to the requirement of (d — i)f3 > a for each i = 
0, • • ■ , r — 1. Then, minimum (3 is obtained as /3* = 1 ■ 
When node repairs are performed by downloading (3* symbols 
from d nodes for each failed node, the bound in (l42l reduces 
to 



M < 



1 



8-1 



i{h,r}c 



(44) 



where h is as defined in Theorem [T4j This establishes the 
file size bound for bandwidth efficient d m j n -optimal locally 
repairable codes. 
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Fig. 5: Flow graph for (r, S) locally repairable code. In this graph, node pairs {T\ n , T° ut }f =1 with edge of capacity ra enforce 
the requirement that each local group has ra entropy. Here 77 = r + S — 1. 



D. Construction of repair bandwidth efficient d n 
locally repairable codes 



-optimal and 



Now it is clear that node repair within a local group is per- 
formed by treating each local group as an (r+8— 1, r, d, a, (3*) 
MSR regenerating code. Using a random linear network 
coding (RLNC) over large enough field, the bound in d44~b 
is achievable |9), ll40l . Since we don't get any reduction in 
repair bandwidth {(3) by setting a greater than 4^, we focus 



on the case when a = 



M 



for the construction presented 
here. Remarkably, the code presented in Section HV-BI when 
an MSR code is employed for the second encoding stage, 
achieves the bound ( 144-b . when we have a\M. We establish 
this claim in the following theorem. 

Theorem 15. Let C loc be a code obtained from the con- 
struction described in Sec. \IV-B\ with a = 44 and an MSR 
regenerating code employed in the second encoding stage to 
generate local parities. If a\M., then C loc attains the bound 
\44\ . i.e., the size of a file that can be stored by using this 
code satisfies 



M = 



- d n 



where h 



1 



1) 



i{h,r}c 



»-rfmin + l 

r+S-1 



Proof: Similar to the proof of Theorem Q~2] we consider 
two cases depending on the difference between the length of 
MRD codeword (output of first stage of encoding) m and the 
file size M.. We first consider the case when ra\(m — M). 
(This corresponds to Case 1 in the proof of Theorem 1121 . 
Recall that in this case, we have A4 = m — ra(i+l) = ra(g — 
i-l),n = (r+6-l)g, d min -l = (*+l)(r+a-l) + (<5-l), 



h = n- d min + 1 - (r + S — 1) 



1 



r + S-1 

(r + S-l)g-(i + l)(r + S - 1) - (S - 1) 

5-1 



(r + S-1) [g-(i + l) - 



1 



(45) 



For h 



r, the right hand side of d44l) becomes 

r+ jzrj — 1 ra + ra = ar(g — (i + 1) — 1 + 1) = ar(g — 
(i + 1)) = M., the size of file that is encoded using C loc . 

Now we consider the second case considered in the proof of 
Theorem [T2l with j = ah, where Ai = m — ra(i + 1) + ba = 
(g — i — l)ra + ba, for some integer < b < r — 1. Here, 
we have used the fact that m = gra. In this case, d m i n — 
1 = (i + l)(r + 5 - 1) - b, and h = (r + 5 - l)g - (i + 
l)(r + S-l) + b-(r + S- l)(g - i - 1) = b < r, since 



i—d n 



r+5-1 



= g - i - 1 - 



-5-1 



= g — i—1. Therefore the 



upper bound on the file size in 6U1 becomes I ""^'^ j ra+ 
ha = (g — i — l)ra + ba = gra — (i + ±)ra + ba = M. 
This establishes that C loc , when MSR code used to generate 
its local parities, attains the bound given in ( l44t . ■ 
In the following example, we illustrate the aforementioned 
construction for repair bandwidth efficient locally repairable 
codes for a particular choice for system parameters. 

Example 16. Consider the following system parameters. 

{M, n, a, r, S, m, N) = (24, 15, 4, 3, 3, 36, 36). (46) 

First M. — 24 symbols over F g 36 are encoded to a codeword 
represented by (ai, . . . , ai2, 61, . . . , b\%, C\, . . . , C12) using the 
[36 x 36, 36 • 24, 13] MRD code. Here 36 encoded symbols 
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Fig. 6: Example of repair bandwidth efficient (r = 3, S = 
3)— locally repairable code with M. = 24 and n = 15. The 
code has d m j n = 8. 



over F,j36 are evaluation of a linearized polynomial on 36 
linearly independent over ¥ q points. The encoded symbols are 
partitioned into 3 groups each of size 12 and stored on 9 nodes 
as shown in Fig. [6] We further add 6 nodes, 2 nodes for each 
local group, using a (5, 3) exact repairable MSR code with 
a = 4 (e.g., (5,3)-zigzag code). 

From ( 1781 ), the minimum distance of this code is at most 8. In 
fact, it is exactly 8 as we have evaluation of data polynomial 
over 24 lineally independent over ¥ q points even when any 
7 nodes fail. Moreover, each failed node can be repaired 
bandwidth efficiently as an exact repairable MSR code is used 
within each local group. 



V. Secrecy in Locally Repairable DSS 

In this section, we analyze locally repairable DSS in the 
presence of secrecy constraints. The eavesdropping model is 
as defined in Section lU We first derive a generic upper bound 
on the secrecy capacity of an (r, 6, a, d, (3) locally repairable 
code, which we later specialize for specific cases of system 
parameters. While addressing specific cases, we also present 
secure coding construction that achieve the respective upper 
bound for certain parameters. 

Consider a data collector, which contacts n — d m - m + 1 
nodes. Let /Q denote the indices of nodes that are contacted 
by the data collector in z-th local group and K, = U? =1 /Cj 
with \IC\ = n — d nlin + 1. Similar to Section IIII-AI we 
classify eavesdropped nodes into two classes: E\ contains 
storage-eavesdropped nodes (l\ nodes in total) and £% contains 
download-eavesdropped nodes (£2 nodes in total). Consid- 
ering the local group i, we denote the set of indices of 
storage-eavesdropped nodes as £\ and download-eavesdropped 
nodes as £\, Here, we have £\ = Uf =1 £J, £2 = Uf_ 1 5^ 
and Ef=i*l = *u Ef=i4 = *2, where l\ = \S\\ 
and l\ = 



f 2 1 ■ We denote X to represent set of tuples 
({£i}f=i> {^2}f=i> Ow}f=i) satisfying these requirements. 
In the following, we provide our generic upper bound on 
the secrecy capacity of (r, 5, a, d, (3) locally repairable codes 
against an (^1,^2) eavesdropper. 

Theorem 17. For an (n, k) DSS employing an (r, S, a, d, /?) 
locally repairable code that is secure against an 



£2) — eavesdropper, we have 

9 

M s < min Vff(sK;Js £ j,d £ j). 

(47) 

Proof: Without loss of generality we can focus on sets 
of indices {£{}f =1 and {£|}f =1 such that \£\ U £|| < r for 
the purpose of getting upper bound on secrecy capacity as 
eavesdropping r nodes in a group gives eavesdropper all the 
information that particular group has to offer. As introduced 
in Section [II] we represent stored and downloaded content at 
node i (set „4) as and <L (repectively, and d^). We 
assume that (Ki,...,K 9 ) s.t. £ { U £ | C /Q or /Q = 0. Note 
that we still need that \£\ \ + |£y = ^1 + ^2 < k in order to 
have a non-zero secure file size. 



H(f s )=H(f 8 \ Sei ,d £2 ) 

= H(f 8 \s £l ,d £2 ) - F(f s |s £l ,d £2 ,s K ) 
= -f(f s ;s/c|s £l ,d £2 ) 
< H(s K \s £l ,ds 2 ) 

= His^,. . . , s/cjsgi, . . . , s £ 9,d s i, . . . , d £ ») 



(48) 
(49) 



< 



s £ i,d £ , 



(50) 



where ( 148b follows from the secrecy constraint, and 
follows by the data collector's ability to obtain the whole 
data. Since we get one such upper bound for each choice of 

({^=1, {£|}?=i, 0Q}?=iX we have 

a 

H(f s )< min ^#(8^^), 

({£?}!=i,{£2}?=i>{>Q}?=i)e*~^ 

where X consists of all choices for 
({^-i}f=n {^llf^i; {^-i}f=i) which satisfy the requirements 
mentioned above. ■ 
Now we consider two cases depending on the number of 
local parities per local group. The analysis of the first case, for 
single parity node per local group, shows that the performance 
of such coding schemes degrade substantially in the presence 
of an eavesdropper that can observe the data downloaded 
during node repairs. The second case, multiple parity nodes 
per local group, allows the node repair to be performed with 
smaller repair bandwidth which results in lower leakage to 
such eavesdroppers observing downloaded data. In both cases, 
we use the vectors L = . . . , If) and I2 = (l^, . . . , Zf) to 
represent a pattern of eavesdropped nodes. 

A. Case 1: 5 = 2 

Consider locally repairable codes presented in fPHl . which 
correspond to 5 = 2. For such codes, during node repair a 
newcomer node downloads all the data stored on other nodes 
in the local group it belongs to. Since the data on each node in 
a local group is a function of data stored on any set of r nodes 
in a local group, all the information in that group is revealed 
to an eavesdropper that observe the data downloaded during a 
single node repair. In other words, we have H(sg i \d £ i ) = => 
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H(s/c i \d £ i) = 0, when £ 2 7^ Accordingly, consider the 
eavesdropping pattern I2 = (1, 1, . . . , 1, 0, . . . , 0) with ones at 
first £2 positions and li = (0, . . . , 0, l[ 2+1 , ■ ■ ■ , If) with zeros 
in first £2 positions. Moreover, consider a data collector which 
accesses set of nodes as used in the proof of Theorem [14] 
These eavesdropping pattern and node access pattern by data 
collector along with d50l l give us the following upper bound on 
the amount of information that can be stored securely on the 
DSS that employ an (n, k,r,S = 2) locally repairable code: 

1 + 



H(i s ) < 



where h 



+ 1 



r + 1 



r + h-(£ 2 r + £ 1 ) 



a, (51) 



i)L- 



- d min + 1 - (r 
In order to see that the above 
present a coding scheme which allows 
( n ~^'i +1 r + h - (£ 2 r + £i)^j a symbols to be securely 
stored against an (£1, ^-eavesdropper. Take a secure file 



l -\ <r. 

r+l J — 

bound is tight, 
file of 



we 
size 



of size 



( n-rf min + l I 
U r+l J 



+ h 
(ri, 



2r + lij) a and {t 2 r + £ x )a 
r {iir+t\)a)- We construct a 
h] 



-rimin + l 



random symbols r 

linearized polynomial f(y) with the ^ 
symbols (including both the secure file and random symbols) 
as its coefficients, and evaluate the polynomial at A4 



r+l 



r+l 



+ hj a linearly independent points over ¥ q . 
These Ai symbols (evaluations of f(y)) are subsequently 
encoded with a minimum distance optimal (r, S = 2, a, d = 
r, /3 = a) locally repairable code for an (n, k) DSS, e.g., 
coding scheme proposed in [13]. It follows from Lemma [3] 
that the file is secured against an (£1, ^2) -eavesdropper if (i) 
H(e) < H(r) (which is trivially true as the eavesdropper ob- 
serves at most (£2?" + £\)a linearly independent symbols) and 
(ii) H(r\u,e) = 0. It remains to show the latter requirement 
also holds. We first note that as the outer code is essentially 
an MRD code, it can be viewed as an MDS code. Thus, 
given u, original data symbols, eavesdropper can remove the 
contribution of monomials associated with secure data symbols 
from the evaluation of f(y), and it can then recover the random 
symbols from the remaining polynomial at hand. (Note that, 
given u, the eavesdropper has (£,2r+£\)a linearly independent 
evaluations of the reduced polynomial to solve for (£2r + £i)a 
random symbols.) Thus, we obtain that H(r\u, e) = 0, which 
establishes the secrecy claim of the proposed scheme. 

Corollary 18. For an (n, k) DSS employing an (r, S = 
2, a, d, /3) locally repairable code, the secrecy capacity against 
an (£1,^2) eavesdropper is given by 



1 



r + h-{£ 2 r + £ 1 ) 



(52) 



M 

k 



B. Case 2: 5 > 2 and a 

In this case, we assume that each node repair within a local 
group is performed in a bandwidth efficient manner. Therefore, 
in each group we can apply the result of Theorem [5] to get 

mmQKiM-Ui+l^) 



H s,Q|s £ i,d £ 



< 



E 

3=1 



(53) 



where 6(a, f3* , l 2 ) is the amount of information that an eaves- 
dropper receives from one intact node (a node not eaves- 
dropped) during the repair of nodes in the i th local 
group. Next, we consider data collector associated with the 
pattern (JCi, . . . ,IC g ) used in the proof of Theorem Q31 and 
the following eavesdropping pattern associated with I2 



l 2 



S + 1, 



I2 — ■ ■ ■ — l 2 



P+l 
2 



I 9 



and Ik 



n~d min +l I \ 
r+S-1 \) 



Here we assume that £2 — s 
(s, p, v) satisfying < p + v < s and v < h). 
Combining (l53l and (l54l we get 



(54) 
v, for some 



p 



H(f s ) < \(r-(l\+s + l))(a-e(a,f3*,s + l)) 



i=l 



r+6-1 



+ E (r-(l{ + s)){a-e{a,0*,s)) 

i=P+l 

+ (min{r, h} - (l{ + v)) (a - 6{a, /3*,v)) . (55) 

If we further assume that the encoding process within each 
local group is a linear array code (MDS by the definition of 
(r, 8) locality) and d = r + 5 — 2 within each local group 
for node repair (i.e., all the live local nodes are contacted for 
repair), then similar to Corollary [8] it follows from Lemma [7] 
that for 4 < 2, 



> 



0* 



2/3* 



(-5-1) 2 



if l\ 

if li 



(56) 



Now ( [53] ) and d56l l can be combined to obtain a bound on 
H(f s ). 

Next, we present a code construction for securely storing 
data against an eavesdropper when £ 2 < 2 



n—dj. 



i + l 



and l\ < 2. We take a file with its size 
the right hand side expression in 

T4^"'i +1 I ra + min{/i, r}a - M 



r+5-l 

M s , equal to 



, and a M - M s = 
i.i.d. uniform random 
symbols. Note that Ai is equal to the upper bound in ( l44l ). 
Now we encode these M. symbols, secure data symbols and 
random symbols, using the two step encoding scheme pre- 
sented in Section ITV-DI In particular, we employ (r + S — 1, r) 
zigzag code within each local group in the second stage of 
encoding process. The secrecy and optimality claim of the 
proposed scheme under given assumption on £2 follows from 
linearized property of the MRD codes (used in the first stage of 
encoding) and the analysis given in Section IIII-BI We present 
this in the following. 

Corollary 19. For an (n,k) DSS employing an (r,6 > 
2,a,d, ft*) locally repairable bandwidth efficient code, the 
secrecy capacity against an {£1^2) eavesdropper with £2 < 
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" j an d £1+ £2 < k is given by 

M s = Q\ + 4)) (« - 

+ E (r-ai+4))(a-e(a,r,4)) 

i=P+l 

+ (min(r, ft) - $ + i*)) (a - 0(a, /T, , 



(57) 

w/zere £ 1 \ = e i> l \ + 4 < r > A < 2 « £"' e " ^ <E3. 

VI. Conclusion 

Distributed storage systems store data in multiple nodes. 
These systems not only require resilience against node failures, 
but also, due to their distributed nature, they may have to 
satisfy security and locality constraints. Regenerating codes 
proposed for DSS address the node failure resilience while 
efficiently trading off storage vs. repair bandwidth. In this 
paper, we considered security and locality aspects of coding 
schemes for DSS. The eavesdropper model analyzed in this 
paper belongs to the class of passive attack models, where the 
eavesdroppers observe the content of the nodes in the system. 
Accordingly, we considered an (£\, £2) -eavesdropper, where 
the content of any £\ nodes, and the downloaded information 
for any £2 nodes are leaked to the eavesdropper. With such 
an eavesdropper model, we first focused on the classical 
setup, which is resilient against single node failure at a time 
(without locality constraints). Noting that the secrecy capacity 
of this setting is open at the minimum storage regenerating 
point, we provided upper bounds on the secure file size and 
established the secrecy capacity for any (^1,^2) with £2 < 2. 
Our coding scheme achieving this result also provides a better 
rate compared to the existing schemes. Then, we shifted 
focus on locality constraint, and studied the general scenario 
of having multiple parity nodes per local group. For this 
setting, we derived a new minimum distance bound for locally 
repairable codes, and present a d m ; n -optimal coding scheme. 
Similar to the trade off analysis for the classical setup, we 
then studied the bandwidth efficient locally repairable codes, 
where we proposed a new bound and a coding scheme which 
is both <i m i n -optimal and repair bandwidth efficient. This 
bandwidth efficient locally repairable setting is also analyzed 
under security constraints, for which we presented a secure file 
size upper bound and codes achieving the bound, and hence 
established the secrecy capacity, under special cases. 

We list some avenues for further research here. 1) We first 
note that the novel bound that we establish for the minimum 
storage point allows for counting part of the data downloaded 
as additional leakage, and hence provide a tighter bound than 
the existing ones. Yet, we have not established the tightness 
of the bound for £2 > 3. Thus, new codes or improved bounds 
are of definite interest for secure MSR codes. 2) For locally 
repairable codes, we utilized MRD coding as the secrecy 



precoding, which requires extended field sizes. Designing 
codes that achieve the stated bounds with lower field sizes is 
an interesting problem. 3) One can also consider cooperative 
(or, multiple simultaneous node failure) repair RD - 11431 in 
a DSS. Secure code design in such a scenario is recently 
considered in [4-4 1 . Codes having both cooperative and locally 
repairable features can be studied. As distributed systems, 
storage problem may exhibit simultaneous node failures that 
need to be recovered with local connections. According to our 
best knowledge, this setting has not been studied (even without 
security constraints). Our ongoing efforts are on the design of 
coding schemes for DSS satisfying these properties. 

Appendix A 
Proof of Lemma[3] 

Proof: The proof follows from the classical techniques 
given by 0, where instead of 0-leakage, e-leakage rate is 
considered. (The application of this technique in DSS is first 
considered in [11].) We have 

7(u;e) = H(e)-H(e\u) (58) 

(a) 

< H(e)-H(e\u)+H(e\u,r) (59) 

(*>) 

< H(r) -J(e;r|u) (60) 

= H(r\u,e) (61) 

( = } (62) 

where (a) follows by non-negativity of H(e\u,r), (b) is the 
condition H(e) < H(r), (c) is due to H(r\u) = H(r) as r 
and u are independent, (d) is the condition H(r\u, e) = 0. ■ 

Appendix B 
Proof of Lemma[7J 

Proof: We prove the Lemma for n — k = 2, i.e., (k + 
2,k)— DSS. The proof extends to higher number of parities 
in straightforward manner. Consider the following encoding 
matrix of the (k + 2, k) linear code employed by the DSS 



G 



I 


. 


. 





I . 


. 





. 


. I 


-4i 


A 2 . 


■ A k 


Si 


B 2 ■ 


■ Bk 



(63) 



Assume that a newcomer node downloads SijXfc+i and 
<S l 2,jXfe+2 from the first and the second parity nodes during the 
repair process of j-th systematic node. Here Sij = Vk+i,j 
and S2J = Vk+2,j are ^ x a matrices. In order to be 
able to perform bandwidth efficient repair using interference 
alignment, {Sij}^ =1 and {S2,j}jLi satisfy 



rank 



SijAi 
S2,jBi 



a 
2 



V* G [k]\{j} 



and 



rank 



SijAj 

S 2,jBj 



(64) 



(65) 
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Note that data downloaded from i-th systematic node (i ^ 
j) for node repair is Vi.jYi = Vi.jii. Since the repair matrix 
of node i associated to node repair of j-th node is Vij, we 
have 

Vij = S ltj Ai = S, .,!!,. (66) 

Note that the above relationship is among subspaces. As 
pointed out earlier in the text, we use uppercase letters to 
represent both matrices and row spaces associated with those 
matrices, using the method of induction, we now show the 
main claim of Lemma Q Note that this proof is modification 
of the proof of Lemma 10 in H31 . 

Base case (|.4| = 1): The statement of Lemma [7] is 
true for this case as we perform a bandwidth efficient node 
repair, where each remaining node contributes ^ independent 
symbols for a single node repair. 

Inductive step: Now we assume that the statement of 
Lemma [7] is true for all sets A C with \A\ < m — 1 

and prove it for all sets of indices of size in. With out loss 
of generality, we prove this for A = [to]. We know from 
inductive hypothesis that 



Moreover, it follows from (|68] l and the fullrankness of A T 
and B m that 



, (67) 



\je[m-l] / \j'6[m-l] / 

Now assume that the result is false for A = [1 : m], i.e., 



dim [ f) V itj = rank f| V h3 
\ie[m] J \je[m] 

= rank f] S h3 A % 

\j£[m] 



= rank f| S 2 , 3 Bi 
\je[m] 

a 



(68) 



Since Ai and Bi are invertible, we have 
rank (f] je[m] SijAA rank (f] je im] S hi) and 

rank (f\ e [ m ] S 2 ,jBA = rank (f\ e [ m ] S 2 ,X Next, consider 



n s ^ A ^ - n 

vj£[m] I \je[ni] 



\je[m-l] 



J'e[m-1] 



Here, the above equation describe the relationship among 
row spaces of participating matrices. Similarly, we have the 
following. 



Kje[m] I je[m-l] 



dim ( ( p| S hj A m = dim p| S 2tj ] B, 

\je[rn] J J \\je[m] 

a 

> — 



(71) 



Thus, we have two subspaces (flje[m] A m and 

(rijg[m] B m of dimension strictly greater than (see 

d7T])), which are contained in the subspace H^efm-i] °^ 
dimension at most (see $6% and (l70b). Therefore, 



pi s hJ A m n n s ^ b ^ * ^ 

V je[m] / / \\jelrn] J J 

=>■ Si tm A m P S 2>m B m 7^ {0} 
which is in contradiction with (l65l l. This implies that 



(72) 



dim ( P Vij J < 



(70) 
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